8 research outputs found
AI-driven Hypernetwork of Organic Chemistry: Network Statistics and Applications in Reaction Classification
Rapid discovery of new reactions and molecules in recent years has been
facilitated by the advancements in high throughput screening, accessibility to
a much more complex chemical design space, and the development of accurate
molecular modeling frameworks. A holistic study of the growing chemistry
literature is, therefore, required that focuses on understanding the recent
trends and extrapolating them into possible future trajectories. To this end,
several network theory-based studies have been reported that use a directed
graph representation of chemical reactions. Here, we perform a study based on
representing chemical reactions as hypergraphs where the hyperedges represent
chemical reactions and nodes represent the participating molecules. We use a
standard reactions dataset to construct a hypernetwork and report its
statistics such as degree distributions, average path length, assortativity or
degree correlations, PageRank centrality, and graph-based clusters (or
communities). We also compute each statistic for an equivalent directed graph
representation of reactions to draw parallels and highlight differences between
the two. To demonstrate the AI applicability of hypergraph reaction
representation, we generate dense hypergraph embeddings and use them in the
reaction classification problem. We conclude that the hypernetwork
representation is flexible, preserves reaction context, and uncovers hidden
insights that are otherwise not apparent in a traditional directed graph
representation of chemical reactions
Robust and Efficient Swarm Communication Topologies for Hostile Environments
Swarm Intelligence-based optimization techniques combine systematic
exploration of the search space with information available from neighbors and
rely strongly on communication among agents. These algorithms are typically
employed to solve problems where the function landscape is not adequately known
and there are multiple local optima that could result in premature convergence
for other algorithms. Applications of such algorithms can be found in
communication systems involving design of networks for efficient information
dissemination to a target group, targeted drug-delivery where drug molecules
search for the affected site before diffusing, and high-value target
localization with a network of drones. In several of such applications, the
agents face a hostile environment that can result in loss of agents during the
search. Such a loss changes the communication topology of the agents and hence
the information available to agents, ultimately influencing the performance of
the algorithm. In this paper, we present a study of the impact of loss of
agents on the performance of such algorithms as a function of the initial
network configuration. We use particle swarm optimization to optimize an
objective function with multiple sub-optimal regions in a hostile environment
and study its performance for a range of network topologies with loss of
agents. The results reveal interesting trade-offs between efficiency,
robustness, and performance for different topologies that are subsequently
leveraged to discover general properties of networks that maximize performance.
Moreover, networks with small-world properties are seen to maximize performance
under hostile conditions
Cationic Amino Acids Specific Biomimetic Silicification in Ionic Liquid: A Quest to Understand the Formation of 3-D Structures in Diatoms
The intricate, hierarchical, highly reproducible, and exquisite biosilica structures formed by diatoms have generated great interest to understand biosilicification processes in nature. This curiosity is driven by the quest of researchers to understand nature's complexity, which might enable reproducing these elegant natural diatomaceous structures in our laboratories via biomimetics, which is currently beyond the capabilities of material scientists. To this end, significant understanding of the biomolecules involved in biosilicification has been gained, wherein cationic peptides and proteins are found to play a key role in the formation of these exquisite structures. Although biochemical factors responsible for silica formation in diatoms have been studied for decades, the challenge to mimic biosilica structures similar to those synthesized by diatoms in their natural habitats has not hitherto been successful. This has led to an increasingly interesting debate that physico-chemical environment surrounding diatoms might play an additional critical role towards the control of diatom morphologies. The current study demonstrates this proof of concept by using cationic amino acids as catalyst/template/scaffold towards attaining diatom-like silica morphologies under biomimetic conditions in ionic liquids
Retrosynthesis Prediction using Grammar-based Neural Machine Translation: An Information-Theoretic Approach
Retrosynthetic prediction is one of the main challenges in chemical synthesis because it requires a search over the space of plausible chemical reactions that often results in complex, multi-step, branched synthesis trees for even moderately complex organic reactions. Here, we propose an approach that performs single-step retrosynthesis prediction using SMILES grammar-based representations in a neural machine translation framework. Information-theoretic analyses of such grammar-representations reveal that they are superior to SMILES representations and are better-suited for machine learning tasks due to their underlying redundancy and high information capacity. We report the top-1 prediction accuracy of 43.8% (syntactic validity 95.6%) and maximal fragment (MaxFrag) accuracy of 50.4%. Comparing our model’s performance with previous work that used character-based SMILES representations demonstrate significant reduction in grammatically invalid predictions and improved prediction accuracy. Fewer invalid predictions for both known and unknown reaction class scenarios demonstrate the model’s ability to learn the underlying SMILES grammar efficiently
G-MATT: Single-step Retrosynthesis Prediction using Molecular Grammar Tree Transformer
In recent years, several reaction templates-based and template-free
approaches have been reported for single-step retrosynthesis prediction. Even
though many of these approaches perform well from traditional data-driven
metrics standpoint, there is a disconnect between model architectures used and
underlying chemistry principles governing retrosynthesis. Here, we propose a
novel chemistry-aware retrosynthesis prediction framework that combines
powerful data-driven models with chemistry knowledge. We report a
tree-to-sequence transformer architecture based on hierarchical SMILES grammar
trees as input containing underlying chemistry information that is otherwise
ignored by models based on purely SMILES-based representations. The proposed
framework, grammar-based molecular attention tree transformer (G-MATT),
achieves significant performance improvements compared to baseline
retrosynthesis models. G-MATT achieves a top-1 accuracy of 51% (top-10 accuracy
of 79.1%), invalid rate of 1.5%, and bioactive similarity rate of 74.8%.
Further analyses based on attention maps demonstrate G-MATT's ability to
preserve chemistry knowledge without having to use extremely complex model
architectures